Multiperm: shuffling multiple sequence alignments while approximately preserving dinucleotide frequencies

نویسندگان

  • Parvez Anandam
  • Elfar Torarinsson
  • Walter L. Ruzzo
چکیده

SUMMARY Assessing the statistical significance of structured RNA predicted from multiple sequence alignments relies on the existence of a good null model. We present here a random shuffling algorithm, Multiperm, that preserves not only the gap and local conservation structure in alignments of arbitrarily many sequences, but also the approximate dinucleotide frequencies. No shuffling algorithm that simultaneously preserves these three characteristics of a multiple (beyond pairwise) alignment has been available to date. As one benchmark, we show that it produces shuffled exonic sequences having folding free energy closer to native sequences than shuffled alignments that do not preserve dinucleotide frequencies. AVAILABILITY The Multiperm GNU Cb++ source code is available at http://www.anandam.name/multiperm

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SSMAL: similarity searching with alignment graphs

MOTIVATION We want to provide biologists with a fast and sensitive scanning tool for searching local alignments of a protein query sequence against databases of protein multiple alignments, such as ProDom. Conversely, we want to provide a tool for locally aligning a protein multiple alignment query against a protein database such as SWISSPROT. RESULTS We developed the program SSMAL (Shuffling...

متن کامل

An Introduction To Motif Based Functional Classification of Large Protein Families

Many methods of clustering proteins within large protein families either build up from pairwise sequence alignments or rely solely on hierarchical clustering methods. While these methods can be incredible useful, they may not efficiently discover small regions of similarity in large multidomain proteins, and they may miss functional similarities that arose due to domain shuffling or convergent ...

متن کامل

Improved alignment of nucleosome DNA sequences using a mixture model

DNA sequences that are present in nucleosomes have a preferential approximately 10 bp periodicity of certain dinucleotide signals, but the overall sequence similarity of the nucleosomal DNA is weak, and traditional multiple sequence alignment tools fail to yield meaningful alignments. We develop a mixture model that characterizes the known dinucleotide periodicity probabilistically to improve t...

متن کامل

AL2CO: calculation of positional conservation in a protein sequence alignment

MOTIVATION Amino acid sequence alignments are widely used in the analysis of protein structure, function and evolutionary relationships. Proteins within a superfamily usually share the same fold and possess related functions. These structural and functional constraints are reflected in the alignment conservation patterns. Positions of functional and/or structural importance tend to be more cons...

متن کامل

Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage.

The similarity of two nucleotide sequences is often expressed in terms of evolutionary distance, a measure of the amount of change needed to transform one sequence into the other. Given two sequences with a small distance between them, can their similarity be explained by their base composition alone? The nucleotide order of these sequences contributes to their similarity if the distance is muc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 25 5  شماره 

صفحات  -

تاریخ انتشار 2009